Unit 7 - Inferential Statistics Workshop

This compulsory e-portfolio component demonstrates practical application of hypothesis testing and summary measures through analysis of real datasets. The exercises cover descriptive statistics, frequency analysis, paired and independent samples t-tests, and interpretation of statistical significance.

Exercise 7.1.1 & 7.1.2: Summary Measures for Diet B

Background

This exercise analyses weight loss data from Dataset B (Diets). The goal is to calculate summary measures for Diet B and compare them with Diet A to assess the relative effectiveness of the two weight reducing diets.

Exercise 7.1 - Summary Measures Worksheet

Excel analysis showing summary statistics for Diet A and Diet B

Results - Exercise 7.1.1 (Summary Statistics)

Diet B Summary Statistics:

Sample size (n) = 50
Sample mean weight loss = 3.710 kg
Sample standard deviation = 2.769 kg

Comparison with Diet A:

Diet A mean weight loss = 5.341 kg
Diet A standard deviation = 2.536 kg

Results - Exercise 7.1.2 (Quartiles)

Diet B Quartiles:

Median (Q2) = 3.745 kg
First Quartile (Q1) = 1.953 kg
Third Quartile (Q3) = 5.404 kg
Interquartile Range (IQR) = 3.451 kg

Comparison with Diet A:

Diet A Median = 5.642 kg
Diet A IQR = 3.285 kg

Interpretation

The results clearly indicate that Diet A is more effective than Diet B for weight reduction:

Central Tendency: The mean weight loss for Diet A (5.341 kg) is approximately 1.63 kg higher than Diet B (3.710 kg). Similarly, the median for Diet A (5.642 kg) exceeds Diet B's median (3.745 kg) by about 1.9 kg. Both measures of central tendency consistently favour Diet A.

Variability: The standard deviations are similar (Diet A: 2.536 kg, Diet B: 2.769 kg), indicating comparable spread in weight loss outcomes for both diets. The IQRs are also similar, suggesting the middle 50% of participants experienced similarly variable results.

Practical Significance: Both diets produce positive average weight loss, indicating both are effective to some degree. However, Diet A produces approximately 44% more weight loss on average, which represents a clinically meaningful difference. This suggests Diet A should be recommended over Diet B for individuals seeking to maximise weight loss.

Exercise 7.1.3: Brand Preferences Analysis for Area 2

Background

This exercise analyses brand preference data from Dataset D (Brandprefs). The goal is to calculate frequencies and percentage frequencies for Area 2 respondents and compare brand preference patterns between the two demographic areas.

Frequency and percentage analysis for brand preferences across two areas

Results - Area 2 Brand Preferences (n = 90)

Brand A: 19 respondents (21.1%)
Brand B: 30 respondents (33.3%)
Other brands: 41 respondents (45.6%)

Comparison with Area 1 (n = 70):

Brand A: 11 respondents (15.7%)
Brand B: 17 respondents (24.3%)
Other brands: 42 respondents (60.0%)

Interpretation

The brand preference patterns differ notably between the two demographic areas:

Brand A: Shows higher preference in Area 2 (21.1%) compared to Area 1 (15.7%), a difference of 5.4 percentage points.

Brand B: Also shows substantially higher preference in Area 2 (33.3%) compared to Area 1 (24.3%), a difference of 9 percentage points.

Other Brands: The preference for competitor brands is notably lower in Area 2 (45.6%) compared to Area 1 (60.0%), a difference of 14.4 percentage points.

Marketing Implications: These findings suggest that the manufacturer's brands (A and B) have stronger market penetration in Area 2, with over half (54.4%) of respondents preferring either Brand A or B. In contrast, Area 1 shows weaker brand loyalty with only 40% preferring the manufacturer's brands. This could inform targeted marketing strategies. Area 1 may require more intensive promotional efforts to increase brand awareness and loyalty.

Exercise 7.2.4: Filtration Two-Tailed Paired t-Test

Background

This exercise analyses filtration data from Dataset G. Each batch was split and filtered using two different agents, making a paired (related) samples t-test appropriate. The goal is to test whether the population mean impurity differs between the two filtration agents.

Exercise 7.2.4 - Filtration Hypothesis Testing

Paired t-test analysis for filtration data comparing two agents

Hypotheses

H₀: μ₁ = μ₂ (no difference in mean impurity between agents)
H₁: μ₁ ≠ μ₂ (mean impurity differs between agents)

Summary Statistics

Agent 1 mean impurity: 8.250 parts per 1000
Agent 2 mean impurity: 8.683 parts per 1000
Mean difference: -0.433 parts per 1000
SD of differences: 0.460

Test Statistics

t statistic: -3.264
Degrees of freedom: 11
p-value (two-tailed): 0.0075

Interpretation

Statistical Conclusion: The two-tailed p-value (0.0075) is less than 0.05, so we reject the null hypothesis at the 5% significance level. There is significant evidence that the population mean impurity differs between the two filtration agents.

Practical Interpretation: Agent 1 produces a mean impurity of 8.250 parts per 1000, which is 0.433 parts per 1000 lower than Agent 2's mean of 8.683. This difference is statistically significant (p = 0.0075, which is significant at the 1% level).

Recommendation: Since lower impurity indicates better filtration performance, Agent 1 appears to be the more effective filtration agent and should be preferred for applications where minimising impurity is important.

Exercise 7.2.2: Filtration One-Tailed Test

Background

Building on Exercise 7.2.4, we now conduct a one-tailed test to determine whether Filter Agent 1 is specifically more effective (produces lower impurity) than Agent 2.

Hypotheses

H₀: μ₁ ≥ μ₂ (Agent 1 impurity is greater than or equal to Agent 2)
H₁: μ₁ < μ₂ (Agent 1 is more effective—produces lower impurity)

Consistency Check

The sample data show Agent 1 mean (8.250) < Agent 2 mean (8.683). The data are consistent with H₁ (Agent 1 more effective).

Test Statistics

t statistic: -3.264
Degrees of freedom: 11
p-value (one-tailed): 0.0038

Interpretation

Statistical Conclusion: The one-tailed p-value (0.0038) is less than 0.01, so we reject the null hypothesis at the 1% significance level. There is strong evidence that Filter Agent 1 is more effective at removing impurities than Agent 2.

Comparison with Two-Tailed Test: The one-tailed test provides stronger evidence (p = 0.0038 vs p = 0.0075) because we had a directional hypothesis. The one-tailed p-value is exactly half the two-tailed p-value when the sample mean difference is in the predicted direction.

Conclusion: There is statistically significant evidence at the 1% level that Filter Agent 1 produces lower impurity levels, making it the preferred choice for filtration applications.

Exercises 7.2.3 & 7.2.5: Bank Cardholder Income Analysis

Background

This exercise analyses bank cardholder data from Dataset C (Superplus). The goal is to test whether the population mean income for male cardholders exceeds that of female cardholders. Since the male and female samples are independent, an independent samples t-test is appropriate.

Exercise 7.2.3 & 7.2.5 - Bank Cardholder Income Analysis

Independent samples t-test comparing male and female cardholder incomes

Step 1: F-test for Equality of Variances

Hypotheses

H₀: σ²ₘₐₗₑₛ = σ²fₑₘₐₗₑₛ (variances are equal)
H₁: σ²ₘₐₗₑₛ ≠ σ²fₑₘₐₗₑₛ (variances are unequal)

Results

Male variance: 233.129
Female variance: 190.176
F statistic: 1.226
df: (59, 59)
p-value (two-tailed): approximately 0.44

Conclusion

p > 0.05, so we fail to reject H₀. We assume equal variances and use the equal variances form of the independent samples t-test.

Step 2: Independent Samples t-Test (One-tailed)

Hypotheses

H₀: μₘₐₗₑₛ ≤ μfₑₘₐₗₑₛ (male mean income does not exceed female)
H₁: μₘₐₗₑₛ > μfₑₘₐₗₑₛ (male mean income exceeds female)

Summary Statistics

Male mean income: £52.913k (n = 60)
Female mean income: £44.233k (n = 60)
Difference in means: £8.680k
Pooled variance: 211.652

Test Statistics

t statistic: 3.268
Degrees of freedom: 118
p-value (one-tailed): 0.0007

Interpretation

Statistical Conclusion: The one-tailed p-value (0.0007) is less than 0.01, so we reject the null hypothesis at the 1% significance level. There is strong evidence that the population mean income for male Superplus Diamond cardholders exceeds that of female cardholders.

Practical Interpretation: Male cardholders have a mean income of £52,913, which is £8,680 higher than the female mean income of £44,233. This represents a 19.6% higher income for males on average.

Assumptions and Validation:

Independence: The samples are independently drawn. Each cardholder appears in only one group.
Normality: Income should be approximately normally distributed. With n=60 in each group, the Central Limit Theorem supports the validity of the t-test even if distributions are slightly non-normal. Validation could be performed using normal probability plots (Q-Q plots) or histograms for each group.
Equal Variances: The F-test (p ≈ 0.44) confirms this assumption is reasonable.
Random Sampling: We assume cardholders were randomly selected from the Superplus Diamond cardholder population.

References

Field, A. (2013) Discovering statistics using IBM SPSS statistics. 4th edn. London: SAGE Publications.
Moore, D.S., McCabe, G.P. and Craig, B.A. (2017) Introduction to the practice of statistics. 9th edn. New York: W.H. Freeman.

Source Artifacts | 📄 Full Report (PDF) | 📊 Excel Workbook